XML based Keyword Search
نویسندگان
چکیده
The success of information retrieval style keyword search on the web leads to the emergence of XML based keyword search. The text database and XML database differences leads to three new challenges: 1) The users search intention is to be identified, i. e. , the XML node types that user wants to search for and search via is identified. 2) The similarities in tag name, tag value and the structure of tags are identified. 3) New scoring function is needed to estimate the output of the search results (XML document) relevance to the given query. However, these challenges cannot be addressed by the existing system, which results in low quality results in terms of query relevance. In this paper, an IR-style approach is proposed which basically utilizes the statistics of underlying XML data to address these challenges. First, specific guidelines that a search engine should meet in both search intention identification and relevance oriented ranking for search results is proposed. Then, based on these guidelines, a novel XML TF*IDF ranking strategy to rank the individual matches of all possible search intentions is proposed.
منابع مشابه
From Revisiting the LCA-based Approach to a New Semantics-based Approach for XML Keyword Search
Most keyword search approaches for data-centric XML documents are based on the computation of Lowest Common Ancestors (LCA), such as SLCA and MLCA. In this paper, we show that the LCA is not always a correct search model for processing keyword queries over general XML data. In particular, when an XML database contains relationships among objects, which is quite common in practical data, LCA-bas...
متن کاملPath-based keyword search over XML streams
Recently, a great deal of attention has been focusing on processing keyword search over XML and XML streams. The keyword search is simple and provides a user-friendly way of retrieving required data from an XML data. Though its popularity, there is a concern over its efficiency. For this reason, several methods have been proposed to enable keyword search over XML streams. However, most of them ...
متن کاملProcessing XML Keyword Search by Constructing Effective Structured Queries
Recently, keyword search has attracted a great deal of attention in XML database. It is hard to directly improve the relevancy of XML keyword search because lots of keyword-matched nodes may not contribute to the results. To address this challenge, in this paper we design an adaptive XML keyword search approach, called XBridge, that can derive the semantics of a keyword query and generate a set...
متن کاملAn Evaluation Study of Search Algorithms for XML Streams
Keyword-based searching services over XML streams are essential for widely used streaming applications, such as dissemination services, sensor networks and stock market quotes. However, XML stream keyword search algorithms are usually schema dependent and do not allow pure keyword queries. Furthermore, ranking methods are still relatively unexploited in such algorithms. This paper presents an a...
متن کاملFrom Structure-Based to Semantics-Based: Towards Effective XML Keyword Search
Existing XML keyword search approaches can be categorized into tree-based search and graph-based search. Both of them are structure-based search because they mainly rely on the exploration of the structural features of document. Those structure-based approaches cannot fully exploit hidden semantics in XML document. This causes serious problems in processing some class of keyword queries. In thi...
متن کاملICRA: Effective Semantics for Ranked XML Keyword Search
Keyword search is a user-friendly way to query XML databases. Most previous efforts in this area focus on keyword proximity search in XML based on either tree data model or graph (or digraph) data model. Tree data model for XML is generally simple and efficient for keyword proximity search. However, it cannot capture connections such as ID references in XML databases. In the contrast, technique...
متن کامل